Random Forest with Increased Generalization: A Universal Background Approach for Authorship Verification

نویسندگان

  • Maria Leonor Pacheco
  • Kelwin Fernandes
  • Aldo Porco
چکیده

This article describes our approach for the Author Identification task introduced in PAN 2015. Given a set of documents written by the same author and a questioned document with an unknown author, the task is to decide whether the questioned document was written by the same author as the other documents or not. Our approach uses Random Forest and a feature-encoding scheme based on the Universal Background Model strategy, building different feature vectors that describe: 1) the complete population of authors in a dataset, 2) the known author, 3) the questioned document and combines the three of them in a single representation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Verification: An Approach based on Random Forest: Notebook for PAN at CLEF 2015

Authorship attribution, being an important problem in many areas including information retrieval, computational linguistics, law and journalism etc., has been identified as a subject of increasingly research interest in the recent years. In case of Author Identification task in PAN at CLEF 2015, the main focus was given on cross-genre and cross-topic author verification tasks. We have used seve...

متن کامل

Conditionally exponential random models for individual properties and network structures: Method and application

Exponential random models have been widely adopted as a general probabilistic framework for complex networks and recently extended to embrace broader statistical settings such as dynamic networks, valued networks or two-mode networks. Our aim is to provide a further step into the generalization of this class of models by considering sample spaces which involve both families of networks and noda...

متن کامل

A Random Forest Approach for Authorship Profiling

In this paper we present our approach to extract profile information from anonymized tweets for the author profiling task at PAN 2015 [10]. Particularly we explore the versatility of random forest classifiers for the genre and age groups information and random forest regressions to score important aspects of the personality of a user. Furthermore we propose a set of features tailored for this t...

متن کامل

Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors

Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors  on the mortality of patients with colorectal cancer using random forest and logistic regression methods.   Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shah...

متن کامل

Bearing Capacity of Shallow Foundations on Cohesionless Soils: A Random Forest Based Approach

Determining the ultimate bearing capacity (UBC) is vital for design of shallow foundations. Recently, soft computing methods (i.e. artificial neural networks and support vector machines) have been used for this purpose. In this paper, Random Forest (RF) is utilized as a tree-based ensemble classifier for predicting the UBC of shallow foundations on cohesionless soils. The inputs of model are wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015